Skip to content

Stop using pull_request: labeled to gate optional tests#18446

Merged
trask merged 2 commits into
open-telemetry:mainfrom
trask:fix-build-pr-label-concurrency
May 1, 2026
Merged

Stop using pull_request: labeled to gate optional tests#18446
trask merged 2 commits into
open-telemetry:mainfrom
trask:fix-build-pr-label-concurrency

Conversation

@trask

@trask trask commented Apr 30, 2026

Copy link
Copy Markdown
Member

Note

Sorry for the wall of text below, but I went down a lot of wrong paths here and ended up with this compromise, and leaving this comment for myself when I think I can do better later on.

Why

Optional test variants (test openj9, test windows, test native) were gated by GitHub PR labels via pull_request: [labeled]. We tried three progressively more elaborate ways to make the labeled trigger work, and all of them broke:

  1. Separate rebuild-pull-request-on-label.yml that re-invoked the build via workflow_call on test * label adds. Two workflows, two concurrency groups, full build ran twice in parallel every time the labeler auto-applied test native (Remove deprecated redact-query-parameters and db-sqlcommenter configurations #18229).
  2. Shared fixed concurrency group so the labeled run cancelled the initial run (Deduplicate PR build runs when a test label is auto-added #18235). The cancelled run's check_suite then sat on the PR as "canceled / failed" check_runs, shadowing the real build in the PR Checks UI (Unify PR build into a single workflow with a dispatcher #18295).
  3. Unified into one workflow with pull_request: [labeled] plus a job-level if: filtering to test * labels (Unify PR build into a single workflow with a dispatcher #18295). But concurrency: is evaluated before the job-level if: filter, so adding any label (renovate, dependabot, code-review-sweep, the labeler, a human) entered the per-PR group and cancelled the in-progress real build (Code review sweep (run 25154254051) #18441). Concrete failure: PR Code review sweep (run 25154254051) #18441, run 25156724872 — every job ended within ~2 seconds with no steps executed. And even after a conditional concurrency group keyed off the label, the labeled run's check_suite still displaced the real build's check_suite in the PR Checks UI (GitHub keys that view by workflow file, latest check_suite wins).

All three failures share a root cause: pull_request: [labeled] was never meant to be a build-configuration trigger.

What

Drop labeled from the trigger entirely. Read labels from github.event.pull_request.labels (present on every pull_request event) and gate optional jobs with contains(...). Skipped required jobs satisfy branch protection so auto-merge still gates correctly. test native detection (formerly the labeler's glob) moves inline as a resolve-native job in the build.

Trade-off

Adding a test openj9 / test windows label doesn't retrigger the build. Push another commit or close/reopen the PR to pick it up. A new comment-on-test-label.yml workflow posts a reminder comment when one of those labels is added so this isn't surprising. (Safe under pull_request_target because that workflow doesn't check out or run any PR code — only gh pr comment.)

@trask trask requested a review from a team as a code owner April 30, 2026 15:47
@trask trask added the area:build Issues about build infra, both local and central label Apr 30, 2026
@trask trask marked this pull request as draft April 30, 2026 15:52
@trask trask changed the title Prevent label-triggered runs from cancelling the main PR build Run label-triggered PR builds in a separate workflow file Apr 30, 2026
@trask trask added test native This label can be applied to PRs to trigger them to run native tests and removed area:build Issues about build infra, both local and central labels Apr 30, 2026
@trask trask force-pushed the fix-build-pr-label-concurrency branch from 23fd63b to f5f7388 Compare April 30, 2026 17:27
@trask trask changed the title Run label-triggered PR builds in a separate workflow file Add 'automated code review' label after PR creation, using GITHUB_TOKEN Apr 30, 2026
@trask trask force-pushed the fix-build-pr-label-concurrency branch from f5f7388 to 52e5a6c Compare April 30, 2026 17:41
@trask trask changed the title Add 'automated code review' label after PR creation, using GITHUB_TOKEN Don't let irrelevant label-adds cancel the in-progress PR build Apr 30, 2026
@trask trask added area:tests and removed test native This label can be applied to PRs to trigger them to run native tests labels Apr 30, 2026
@trask trask force-pushed the fix-build-pr-label-concurrency branch from 52e5a6c to bab64fd Compare April 30, 2026 17:54
@trask trask changed the title Don't let irrelevant label-adds cancel the in-progress PR build Move label-triggered PR builds to a separate workflow file Apr 30, 2026
@trask trask added area:build Issues about build infra, both local and central test native This label can be applied to PRs to trigger them to run native tests and removed area:tests labels Apr 30, 2026
@trask trask force-pushed the fix-build-pr-label-concurrency branch from bab64fd to b64786a Compare April 30, 2026 20:15
@trask trask changed the title Move label-triggered PR builds to a separate workflow file Replace test-* labels with /test comment commands Apr 30, 2026
@trask trask force-pushed the fix-build-pr-label-concurrency branch 2 times, most recently from 7c7f79d to b19e80d Compare April 30, 2026 21:35
@trask trask changed the title Replace test-* labels with /test comment commands Stop using pull_request: labeled to gate optional tests Apr 30, 2026
@trask trask force-pushed the fix-build-pr-label-concurrency branch from b19e80d to b0b027c Compare April 30, 2026 22:35
The labeled trigger caused two issues that combined into recurring CI breakage on PRs that received any label (renovate, dependabot, code-review-sweep, the labeler, or a human applied one):

* GitHub evaluates a workflow's `concurrency:` group before its job-level `if:` filter, so a labeled run cancelled the in-progress real build before being filtered out.
* Even after a conditional concurrency-group fix, the labeled run's check_suite displaced the in-progress real build's check_suite in the PR Checks UI (GitHub keys that view by workflow file, latest check_suite wins).

Concrete failure: PR open-telemetry#18441, run https://github.com/open-telemetry/opentelemetry-java-instrumentation/actions/runs/25156724872 — every job ended within ~2 seconds with no steps executed.

Fix: drop `labeled` from the trigger entirely. Read labels from `github.event.pull_request.labels` (which is present on every `pull_request` event) and gate optional jobs by `contains(...)`. Skipped jobs that are listed as required satisfy branch protection, so auto-merge still works.

* `test openj9` / `test windows` labels: still consulted, but only at the moments the build naturally runs (opened / synchronize / reopened). Applying a label after the last push no longer re-triggers the build; to pick up the label, close/reopen the PR or push another commit. This trade-off avoids the labeled-event race and the wasted reruns from non-test labels.
* `test native` is now detected inline from the PR's changed paths via a new `resolve-native` job that mirrors the former `.github/labeler.yml` `test native` rule. The label is no longer auto-applied; the manual label is still honored as a force-enable.
* `.github/labeler.yml` and `.github/workflows/label.yml` are deleted (the labeler's only rule was the `test native` one).
@trask trask force-pushed the fix-build-pr-label-concurrency branch from b0b027c to b5f15aa Compare April 30, 2026 22:41
@trask trask removed the area:build Issues about build infra, both local and central label Apr 30, 2026
@trask trask marked this pull request as ready for review May 1, 2026 03:13

@zeitlinger zeitlinger left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish that could be simpler - but I can't think of anything

@trask trask merged commit 879ce30 into open-telemetry:main May 1, 2026
95 checks passed
@trask trask deleted the fix-build-pr-label-concurrency branch May 1, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test native This label can be applied to PRs to trigger them to run native tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants